Classified information: the data clustering problem
نویسندگان
چکیده
a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as “relevant to the current project” or “irrelevant.” Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respect to some distance measure. In this project, we investigate the design, use, and pitfalls of a popular clustering algorithm, the k-means algorithm.
منابع مشابه
Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers
Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering in which there is no need to be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...
متن کاملDesign a Hybrid Recommender System Solving Cold-start Problem Using Clustering and Chaotic PSO Algorithm
One of the main challenges of increasing information in the new era, is to find information of interest in the mass of data. This important matter has been considered in the design of many sites that interact with users. Recommender systems have been considered to resolve this issue and have tried to help users to achieve their desired information; however, they face limitations. One of the mos...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computing in Science and Engineering
دوره 5 شماره
صفحات -
تاریخ انتشار 2003